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Abstract 

Parikh's theorem states that every Context Free Language (CFL) has the same Parikh image as that of a regular 
language. A finite state automaton accepting such a regular language is called a Parikh-equivalent automaton. In 
the worst case, the number of states in any non-deterministic Parikh-equivalent automaton is exponentially large 
^ , in the size of the Context Free Grammar (CFG). We associate a regularity width d with a CFG that measures 

' the closeness of the CFL with regular languages. The degree m of a CFG is one less than the maximum number 

of variable occurrences in the right hand side of any production. Given a CFG with n variables, we construct 
a Parikh-equivalent non-deterministic automaton whose number of states is upper bounded by a polynomial in 



> 



^^^2d(m+i)^^ the degree of the polynomial being a small fixed constant. Our procedure is constructive and runs 



in time polynomial in the size of the automaton. In the terminology of parameterized complexity, we prove that 
constructing a Parikh-equivalent automaton for a given CFG is Fixed Parameter Tractable (Fpt) when the degree 

(N ■ 

m and regularity width d are parameters. We also give an example from program verification domain where the 
degree and regularity are small compared to the size of the grammar. 

o 



1 Introduction 



The Parikh image Il{'w) of a word w over a finite alphabet E is a mapping Il^w) : S ^> N such that for each letter 
CT £ S, Ii{w){a) is the number of times a occurs in w. The Parikh image of a language is the set of Parikh images 
of its words. The well known Parikh's theorem [11] states that for every Context Free Language (CFL) L, there is 
a regular language with the same Parikh image as that of L. This fundamental result in automata theory has many 
applications, including verification [TUlIIllll], equational horn clauses [15j and automata theory itself 
\^ • Apart from the equivalence itself, the complexity of computing a representation of the regular language or the 

Parikh image is crucial to the efficiency of many applications. There are examples where any Parikh-equivalent 
^\ ' non-deterministic automaton is exponentially large in the size of the Context Free Grammar (CFG) (see Sect. 12. iT) . 
' However, as is frequently the case, instances of this problem arising in applications have some structure that can be 
exploited to compute smaller automata. In this paper, we introduce a systematic way of measuring the "closeness" 
of a given CFL to regular languages and show that closer the CFL is to regular languages, smaller will be the size of 
Parikh-equivalent non-deterministic automata. More precisely, 

I 1. We define a number called regularity width d that can be computed from a given CFG. If the CFG happens 

■ to be a regular grammar, then its regularity width will be 1. 

2. As an illustration, we show that instances of CFGs arising from a verification application [l] will have small 
values of regularity width d. 

3. With n denoting the number of variables in the given CFG and degree m being one less than the maximum 
number of variable occurrences in the right hand side of any production, we show that a Parikh-equivalent non- 
deterministic finite automaton can be constructed whose number of states is upper bounded by a polynomial 
in n((i^''''"+^^), the degree of the polynomial being a small fixed constant. 

Finer study of automaton complexity has been done before, e.g., [H [13]. In [3], a parameter called number 
of procedure variables p is introduced and it is proved that when p is a fixed constant, a Parikh-equivalent non- 
deterministic automaton can be constructed whose number of states is polynomial in n. The result in this paper is 
both a generalization and refinement of the results in [4]. It is a generalization since the regularity width d defined 
here is linear in the number of procedure variables p for CFGs arising from problems being considered in [J , and d 
can be computed for any given CFG. Our result is a refinement since [3] gives automata sizes of the kind while 
we give automata sizes of the kind n{d'*''^). To get a rough idea of the kind of difference this can make asymptotically, 
consider the ratio between nP~^^ and 2Pn taken from [3]: for n = 100 and p = 10, the ratio is 9.8 x lO^** while for 
n = 150 and p = 20, the ratio is 2.1 x lO^'''. 
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To systematically explore the possibility of finding efficient algorithms for restricted cases of computationally 
hard problems, Downey and Fellows introduced parameterized complexity [2]. If n is the size of an input instance 
and d is its parameter (that is usually much lesser than n), then algorithms with running time C'(n^°'^^''') are called 
XP algorithms and those with running time O{f{d))poly{n) are called Fixed Parameter Tractable (Fpt) algorithms. 
Here, poly is any polynomial and / is any computable function (usually required to be single exponential or less to 
be of any immediate practical use). It is known that the class of XP algorithms is strictly more powerful than the 
class of Fpt algorithms. The procedure we give for constructing the automaton runs in time polynomial in the size 
of the output and hence is Fpt. 

Our results build on technique based on pumping for CFLs [11] [7] . Additional techniques and arguments are 
needed to closely control where and how pumping is done so that for CFGs with small regularity width, smaller 
automata suffice. Techniques from graph theory and tree decompositions [51 Chapter 6] are used in arguments on 
size of the constructed automata. 

Related work: A finer analysis of the automata size has been done in [13] with focus on the size of the alphabet. 
In [5], the size of automata are related to finite index CFGs. Some of the techniques here are inspired by insights 
given in [5]. Since Parikh images of CFLs are semilinear, they can be represented in Presburger arithmetic. In |15j . 
the complexity of Presburger formula representation has been considered. The algebraic view of Parikh's theorem 
has been studied in [TH 12] . 

2 Preliminaries 

2.1 Context Free Grammars 

Let S be a finite set of symbols, called terminals. A word w over S is any finite sequence of terminals. The empty 
sequence is denoted by e. The word obtained by concatenating i S N copies of w is denoted by w'. The set of all 
words over S is denoted S*. A language L is any subset of E*. The Parikh image n(w) of a word is a mapping 
n(w) : S — 7- N such that for each cr g S, Yi{w){a) is the number of times a occurs in w. The Parikh image of e is 
denoted by 0. The Parikh image of a language L C E* is the set of mappings {n(w) | w € L}. 

We follow the notation of 8, Chapter 5]. Let G — (V, E, P, S) be a CFG with a set V — {Ai, . . . , An} of variables, 
a finite set E of terminals, a finite set P CV x {V U E)* of productions and an axiom S V . We denote words over 
E by w, wi etc. A production (A, wqAiWi ■ ■ ■ ArWr) £ P is denoted as A wqAiWi ■ ■ ■ ArWr- The degree m of G is 
defined to be — 1 + max{r | A wqAiWi ■ ■ ■ ArWr is a production}. 

The set of words L{G) over E generated by G is called the language of G and is called a CFL. Parikh's theorem fTT| 
states that the Parikh image of every CFL is equal to that of a regular language. Given a CFG G, a finite automaton 
accepting a language whose Parikh image is the same as that of L{G) is called a Parikh-equivalent automaton for 
G. Consider the CFG G„ with productions {Aj ~^ Aj-iAj-i | 2 < j < n} U {Ai a} and axiom S — An- The 
language of G„ is the singleton set {a^ } and hence the smallest Parikh-equivalent non-deterministic automaton 
has 2"-i -I- 1 states. 

We assume familiarity with parse trees [H Chapter 5] and yield of parse trees. We denote parse trees by t, ti 
etc. and their yeilds by Y{t), Y{ti) etc. The variable labelling the root of a parse tree t is denoted by root(t). For 
any variable A £ V , the parse tree t is defined to be A-recurrence free if in any path from the root to a leaf of t, A 
occurs at most once. We define t to be A-occurrence free if A does not occur anywhere in t. If t is A-recurrence free 
and rooted at A, then any proper subtree of t is yl-occurrence free. We write i = ii • ^2 to denote that ti is a parse 
tree except that exactly one leaf 77 is labelled by a variable, say A, instead of a terminal; the tree t2 is a parse tree 
rooted at A; and the parse tree t is obtained from ti by replacing the leaf rj with t2- The height of a parse tree t is 
denoted by h{t). 

Using the fact that a Parikh-equivalent automaton need not preserve the order in which letters occur in accepted 
words, the following lemma allows us to manipulate parse trees so that the automaton need not keep track of too 
many occurrences of the same variable. This is also one of the key observations used in the automaton construction 
given in [5]. 

Lemma 1 ([5]). Suppose ^1,^2 ire two parse trees and A G V is a variable such that ti is not A-recurrence free and 
t2 is not A-occurrence free. Then there are parse trees t'i,t'2 such that 

1. root(t'j^) = root(ti), root(t2) — root(t2) and 

2. n{Y{ti))+Ii{Y{t2))=n{Y{t'{))+W{Y{t'2)) and 

3. ti is A-recurrence free. 
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Proof. Since there is a path from the root to a leaf of ti in which A occurs at least twice, we can write ti — t ■ t' ■ t" 
where t' and t" are rooted at A. The parse tree t2 can similarly be written as t2 — t2 ■ t'2 , where t2 is rooted at A. 
Replace ti by t ■ t" and t2 by t'2 ■ t' ■ t2 (i.e., remove the subtree t' from ti and insert it into ^2)- This will reduce the 
number of nodes in ti. Repeat this process until ti is A-recurrence free. This process will terminate after finitely 
many steps since there are only finitely many nodes in ti to begin with. □ 

2.2 Graphs and Tree Decompositions 

Definition 2 (Tree decomposition, treewidth). A tree decomposition of an undirected graph H = {V, E) is a pair 
(T, (-B,,),,eNodcs(r))j where T is a tree and (-Br;)j)eNodcs(r) "i-^ a family of subsets of V (called hags) such that: 

• For all V gV, the set {r/ G Nodes(7~) | v G B,j} is nonempty and connected in T . 

• For every edge (ui,W2) G E, there is a rj G Nodes(7~) such that vi,V2 G -B,,. 

The width of such a decomposition is the number max{|_B^| | rj G Nodes(7~)} — 1. The treewidth tw{H) of H is the 
minimum of the widths of all tree decompositions of H . 

If a graph H has an edge between every pair of vertices among {vi, . . . , u^}, then {vi, . . . , Vr} is said to induce a 
clique in H. 

Lemma 3 Lemma 6.49]). // the set of vertices {vi, . . . ,Vr} induce a clique in the graph H, then every tree 
decomposition of H will have a bag B with {wi, . . . , w^} ^ B. 

3 The Regularity Measure 

In this section, we define the regularity width and give an example application where regularity width is much lower 
than the size of the CFG. 

We define a binary accessibility relation — >■ between variables in V as follows. We have A ^ A' \i there is a 
production A ^ {V iJ ll)*A'{y U E)*. The reachability relation — is the transitive closure of the accessibility 
relation — 

Definition 4 (Reminder graph and regularity width). For a CFG G — [V^ E, P, S), its reminder graph R{G) is a 
graph whose set of vertices is V and set of edges E is as follows. For every production A wqAiWi ■ ■ ■ Aj-Wr in G 
with r >2 and every 1 < i, j < r, following edges are present: 

i ^ j implies {Ai, Aj) £ E (1) 



V^' G y \ {Aj}, A, A' implies {A', Aj) e E (2) 
The regularity width d of G is defined to be tw{R{G)) + 1. 

The intuition behind the definition of reminder graph is explained in the following diagram of a parse tree. 
Suppose we are trying to imitate this parse tree through a finite state automaton and we go down the tree rooted 




Figure 1: An illustration for reminder graph 

at Ai. The edge between A2 and A3 reminds us that both A2 and A3 have to be followed up later. Going down the 
production starting from Ai, suppose we reach the variable A' . The edge between A' and A2 reminds us that A2 is 
yet to be followed up. The edge between A' and Ai reminds us that we have already gone down a tree rooted at Ai, 
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so we should avoid going down subtrees that are also rooted at Ai , thus avoiding the necessity to keep track of too 
many Ais. 

Since regular grammars have at most one variable in the right hand side of any production, their reminder graphs 
do not have any edges. Hence, regularity width of regular grammars is 1. Following is a more interesting example 
from [3]: if there are programs running in many threads in parallel and synchronizing on common actions, certain 
verification problem reduces to reasoning about Parikh images of CFL derived from the programs. If there is a 
program point Ci from which some action a can be performed and program point C2 can be reached, then we create 
a production Ci ^ aC2 ■ If a subroutine Pq is invoked at point Ci and then C2 is reached, we create a production 
Ci PqC2- Lets call variables like Pq and C2 ports. The number of ports is twice the number of control locations 
that invoke some subroutine. The only edges in the reminder graph are those between ports and other variables. 
This gives an easy way to construct a tree decomposition of the reminder graph: if Ci , . . . , is a list of all program 
points, then create a path with r nodes, each node rji associated with a bag Bi, 1 < i < r. If P is the set of all ports, 
then setting Bi = VU {Ci} for every 1 < i < r will give us a tree decomposition of the reminder graph with each bag 
containing {Vl + 1 elements. Hence, for a program with \V\ ports, the associated CFG has regularity width \P\ + 1. 

4 The Automaton Construction 

Let U be any set. A multiset u over [/ is a mapping u : U N. We sometimes use the notation |[mi,M3,U3]] to 
denote the multiset that maps 1 to ui, 2 to U3 and to all others. The empty multiset is denoted 0. Given two 
multisets ui and U2, their sum ui © U2 is defined to be the mapping such that ui © U2(u) = ui(u) + U2(u) for all 
u € U. If u(w) > 1, then u is defined to be the mapping such that u = u(m') for all u' E U \ {u} and 

u0 = u(m) - 1. 

The automaton we construct will have as their states sequences of reminder pairs defined below. 

Definition 5. A reminder pair R is a tuple (^, v) where A Cz V U {_L} is a variable or a special symbol _L and 
V : y — >■ N zs a multiset over V . The first component of this tuple will be referred to as i?. Current and the second 
component as i?.Followup. A reminder sequence RS — Ri ■ ■ ■ Rp is a sequence of reminder pairs, where p ~ \RS\ 
is the length of RS . The reminder sequence RS = i?i • • • Rp ends with A i/i?p. Current = A. The variable A occurs 
i times in RS if \{j <E N \ 1 < j < p, i?j .Current = = i. 

Given a CFG G, we define a finite state automaton A{G) by describing its states and transition relation below. 

Definition 6. Let G be a CFG. Then A{G) is a finite state automaton whose initial state is the reminder sequence 
consisting of the single reminder pair (5,0), where S is the axiom of G. A reminder sequence RS is a state of A{G) 
if for any variable A £V , A occurs at most twice in RS and it is reachable from (S", 0) by the transition relation =4> 
specified below. In the following, RS could be the empty sequence e too. 

1. If RS ■ (^,0) is a reminder sequence and A w^jAiWi ■ ■ ■ ArWr is a production with r > \, then RS ■ 
{A, 0) RS ■ {A„ lAi,...,A4e jA,}) for every 1 < j < r. 

2. If RS ■ v) is a reminder sequence, v 7^ and A ^ wqAiWi ■ ■ ■ ArWr is a production with r > 1, then 
RS ■ {A,v) RS ■ {A,v) ■ {A,, IA,,...,A4Q lA,j) for every l<j<r. 

3. If A-^w IS a production, then RS ■ {A, ^) ^ RS ■ (1, 0). 

4-. If A-^ w is a production, then RS • (A, v \A'\) => RS ■ {A',v) for every A' G V. 

5. RS • (A, V lA'j) ■ (1, RS- {A', v) for every A' G V. 

The final state of A{G) is (±,0). 

If RS is a reminder sequence, we denote by RS{i} = {A e V \ _Ri.Followup |[i?i.Current|(A) > 1} the set 
of those variables that occur in the i*'' reminder pair of RS. The following lemma is the motivation for using the 
treewidth of the reminder graph to define regularity width. 

Lemma 7. Let G be a CFG with degree m and RS be a state of A(G). The set of variables Ui<i<|/js| RS{i} induces 
a clique in the reminder graph ofG. For any l<i< \RS\, ^^gy i?i.Followup(A) < m. 

Proof. Let RS be any state of A{G) and 1 < ?, j < \RS\ be positions of RS. By induction on the minimum number 
I of transition relation pairs RSi RS2 that have to be traversed to reach RS from (5,0), we will prove the 
following claims: 
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I If i?i.Followup ^ 0, then there is some production A uio^i^i • • • ArWr such that RS{i} C {Ai, . . . , Ar} and 
r > 2. 

II If i?i.Followup = 0, then (i?^. Current, 0) is the last reminder pair of RS. 

III If i < j, then for any variable A e RS{j}, i?.;. Current A. 

IV J2Aev Ri-^oWowupiA) < m. 

In the following, it is clearly seen that claim ITVl holds, so it will not be mentioned explicitly. 
Base case £ = 0: Here, RS = {S, 0) for which all the claims clearly hold. 

Induction step: We distinguish between 5 cases depending on the type of transition relation (in Def. [6]) that is 
traversed for the last time to reach RS from (S, 0). 

Case ([B): A wqAiWi ■ ■ ■ ArWr is a production with r > 1 and RSi ■ {A, 0) '"""'> RSi ■ {Ak, jAi,. . . ,Ar}0 
lAkf) = RS. Claim H] is satisfied since ^Ai, . . . , A^l |[^fe| ^ implies that r > 2. Claim HIl is satisfied since by 
induction hypothesis, none of the reminder pairs in RSi can be of the form (A',0) for any A' g V. Claim Hill is 
satisfied since A A' for any A' G {Ai, . . . , Ar}. 

Case ©: v 7^ 0, A — + wqAiWi ■ ■ ■ ArWr is a production with r > 1 and RSi ■ (A, v) "'""'^ RSi ■ (A, v) • 
(Afc, JAi, . . . , e [Afc]]) = RS. Claim His satisfied since . . . , A^l e [AfeJ ^ implies that r > 2. Claim |ll] is 
satisfied since by induction hypothesis, none of the reminder pairs in RSi can be of the form {A' , 0) for any A' e V. 
Claim Hill is satisfied since A A' for any A' G {Ai, . . . , Ar}. 

Case ([3]): A w is a, production and RSi ■ {A, 0) ^ RSi ■ (_L, 0) = RS. ClaimUis satisfied by any reminder pair 
in RSi by induction hypothesis and is vacuously true for (_L,0). Claim |ll] is satisfied since by induction hypothesis, 
none of the reminder pairs in RSi can be of the form {A' , 0) for any A' G V. Claim HUl is satisfied since it is satisfied 
in RSi by induction hypothesis and _L is not a variable. 

Case (|4]): A w is a production and RSi ■ {A, v © lA'J) =^ RSi ■ {A' , v) — RS. By induction hypothesis, there 
is a production A" wqAiWi ■ ■ ■ ArWr with r > 2 such that {A, A'} U {A'" \ v(A"') > 1} C {Ai, . . . , Ar}. Hence, 
RS satisfies claim ID Claim HH is satisfied since by induction hypothesis, none of the reminder pairs in RSi can be of 
the form (A"',0) for any A'" G V. Claim Hill is satisfied by induction hypothesis. 

Case (P: RSi ■ {A,v ® |[v4'|) • (±,0) ^ RSi ■ (A',v) = RS. By induction hypothesis, there is a production 
A" WqAiWi ■ ■ ■ ArWr with r > 2 such that {A, A'} U {A'" \ v{A"') > 1} C {Ai, Ar}. Hence, RS satisfies claim 
m Claim im is satisfied since by induction hypothesis, none of the reminder pairs in RSi can be of the form {A'" , 0) 
for any A'" G V. Claim IIIII is satisfied by induction hypothesis. This completes the induction step and hence the 
claims H El Imj and IV] are true. 

Now we are ready to prove the lemma. Let RS be any state. By claim HI if |i?5'{i}| > 1, then there is some 
production A ^ wqAiWi ■ ■ ■ ArWr such that RS{i} C {Ai, . . . , Ar} and r > 2. By ([T]) in Def. HI the reminder graph 
has an edge between every pair of variables in RS{i}. Let 1 < i < j < \RS\. It only remains to prove that the 
reminder graph has an edge between any variable in RS{i} and any variable in RS{j}. By claim|lll _Ri.Followup 7^ 
and by claim U there is a production A" wqAiWi ■ ■ ■ ArWr with r > 2 such that RS{i} C {Ai, . . . , Ar}. From 

claim IhH for any variable A G RS{j}, i?.;. Current — 1> A. Since i?^. Current G RS{i}, ([2]) in Def. IHimplies that the 
reminder graph has an edge between any variable in RS{i} and any variable in RS{j}. □ 

If RS is a state of A{G), then any tree decomposition of the reminder graph of G will have a bag containing all 
variables occurring in RS, by LemmajTland LemmajS] Since no variable can occur more than twice in RS, this gives 
us the required upper bound on the size of A{G). 

Lemma 8. For a CFG G with regularity width d, degree m and n variables, the number of states of A{G) is bounded 
by a polynomial of n{d'^'^^"^~^^^)e\P\, where e is the maximum number of terminal occurrences in the right hand side 
of any production and \P\ is the number of productions. 

Proof. By Lemma [71 the set of all variables Vi occurring in a reminder sequence RS that is a state of -4(G) induce a 
clique in the reminder graph of G. By Lemma O any tree decomposition (and hence an optimal tree decomposition) 
of the reminder graph of G has a bag that contains all variables in so \Vi \ < d. Since any one variable can occur 
at most twice in RS, there are at most Id reminder pairs in RS. For any graph, there is a optimal tree decomposition 
in which the number of nodes is at most the number of vertices in the graph Lemma 11.9]. Consider such a tree 
decomposition of the reminder graph of G where each bag has at most d variables and there are at most n bags. The 
number of reminder sequences of lengh at most 2d that can be constructed from variables in any one bag of this tree 
decomposition is at most Since there are at most n nodes in this tree decomposition, the number of states 

in A{G) is upper bounded by ri(d^''(™+^)). 

The automaton A{G) has transition relations that read words. If we need the usual automaton that reads single 
letters, the size will increase by a multiplicative factor that is a polynomial in |P| and e. □ 
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The idea behind A{G) is that it will go down one path of a parse tree, remembering siblings along the path that 
will have to be visited later. In addition, if the automaton goes down a parse tree t rooted at a variable A, Lemma [1] 
is used to make one of the subtrees of t A-recurrence free so that A{G) can go down this subtree without having 
to remember any more siblings rooted at A. However, the application of Lemma [T] may introduce recurrences in 
other variables. To handle this, we need the notion of a valuation. The set of multisets over the set of parse trees is 
denoted by T. 

Definition 9. Let RS be a reminder sequence. A valuation val for RS is a mapping val : {1. . . . , |i?>S'|} T such 
that for each I < i < \RS\, val{i) = |[ti, . . . ,^^1 implies i?i.Followup = |[root(<i), . . . ,root(tr)I- The Parikh image of 
such a valuation is defined as Il{val) = X]i<i<|fl5| vai(i)(t)>i n(y(t)'"''(*'(*^) the sum of Parikh images of yields of all 
parse trees occurring in val. A parse tree t is said to occur in val at level i if val{i){t) > 1. 

Analogous to the notion of compact parse trees introduced in [5 , we define the notion of compact valuations. 

Definition 10. A valuation val for a reminder sequence RS is defined to be compact if the following properties are 
satisfied: 

CP. I If A G V is such that Current = i?^;. Current = A, I < j < k < \RS\ and i?fc.Followup 7^ 0, then all parse 
trees occurring in val at level k + 1 or higher are A-occurrence free. 

CP. II For every 1 < i < \RS\, if all parse trees occurring in val at level i + 1 or lower are Ri.CuTTent- occurrence 
free, then all parse trees occurring in val at level i + 2 or higher are Ri .CuTrent-occurrence free. 

CP. Ill For every 1 < i < \RS\, if there is a parse tree occurring in val at level i + 1 or lower that is not i?^. Current- 
occurrence free, then all parse trees occurring in val at level i + 2 or higher are Ri.CuTTent-recurrence free. 

Now we will try to give some intuition behind the above definition. Suppose A{G) at state RS is at the root of 
a parse tree t whose children are roots of parse trees ti, . . . ,tr. Then A{G) will go down one of the subtrees, say ti. 
This fact is remembered by the last reminder pair R\iis\ by setting .Current — root(ii) and .FoUowup = 

|[root(t2), • • • ,i'Oot(tr)I, which intuitively means that root(ii) is the label of the root of the subtree that is currently 
being handled, and subtrees rooted at root(i2), ■ ■ • ,root(tr) are to be followed up later. The property CPU above 
means that if A{G) has already seen a variable A twice in the subtree currently being handled, first at level j and 
then at level k, then A will never be seen again in the current subtree (subtrees of the current subtree will end up in 
level A: + 1 or higher). This will ensure that A{G) will not need reminder sequences that are too long. To ensure that 
CPUis always maintained, A{G) has to be careful about which subtree to go into at each stage. This is captured by 
CP IIIII if at stage i, A{G) is at a subtree rooted at i?;. Current, then all further recurrences of i?^. Current are moved 
(using Lemma [1]) into one of the children of the current subtree and moved into i?i-(-i.Followup to be followed up 
later, ensuring that the subtree being handled right now are free of i?^. Current-recurrences (and hence any subtree 
that gets into i + 2 or higher levels are free of Current-recurrences too). The property CP|TT] captures the fact 
that if at stage i, Lemma[T]was not used to push all Current-recurrences into one of the subtrees at i + 1^^ stage 
(so that all parse trees occurring at level z -I- 1 or lower are i?^. Current-occurrence free), it was because none of the 
subtrees had any occurrence of Current at all (so that all parse trees occurring at level i -I- 2 or higher are also 

Current-occurrence free). 

If RS is a state of G, val is a valuation for RS and I < i < \RS\, then RS \ i is the reminder sequence Ri ■ ■ ■ Ri and 
val \ i is the restriction of val to {1, . . . , i}. We denote by V[val t *] = {A G V \ 3j > i, A occurs in parse tree t, val{j){t) > 
1} the set of all variables labelling parse trees that occur in val at level i or higher. Similarly, V[val ii] = {A G V \ 
3j < i,A occurs in parse tree t, val{j){t) > 1} is the set of all variables labelling parse trees that occur in val at level 
i or lower. 

Proposition 11. If val is a compact valuation for RS , then val \ (\RS\~1) is a compact valuation for RS \ {\RS\ — l). 

Proof If AeV is such that i?j.Current = i?fc. Current = ^, 1 < j < A: < \RS\, i?fe.Followup ^ and k < \RS\ - 1, 
then by compactness of val, all parse trees occurring in val \ {\RS\ — 1) at level /c + 1 or higher are A-occurrence 
free. Therefore, val \ {\RS\ - 1) satisfies CPU 

For every \ < i < \RS\ — 1, if all parse trees occurring in val at level i + 1 or lower are i?^. Current-occurrence 
free, then by compactness of val, all parse trees occurring in val \ i\RS\ — 1) at level z 4- 2 or higher are i?^. Current- 
occurrence free. Hence val \ [\RS\ — 1) satisfies CPlTTl 

For every 1 < i < \RS\ — 1, if there is a parse tree occurring in val at level i + 1 or lower that is not i?^. Current- 
occurrence free, then by compactness of val, all parse trees occurring in val \ i\RS\ — 1) at level i + 2 or higher are 

Current-recurrence free. Hence val \ i\RS\ — 1) satisfies CP IIIII □ 

Lemma 12. // a valuation val for a reminder sequence RS satisfies properties CP\^ and CP\T^ of Def. \1(A then there 
is a compact valuation val' for RS such that Il{val') — ll{val) and for all 1 < i < \RS\, V[val ii] Q V[val' I i]. 
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Proof. By induction on \RS\. For the base case \RS\ = 1, the property CP IIIIl is vacuously true for val and hence it 
is compact. 

For the induction step, we wih describe a compactification procedure. 

Step 1: Suppose for some 1 < i < |i?<S'| — 1, there is a parse tree ti occurring in val at level i' < i + I that is not 
Current-occurrence free and there is a parse tree t2 occurring in val at level \RS\ that is not Current-recurrence 
free (we will call such a pair of parse trees (ti,t2) a CP I//7I violation witness). Since there is a path from the root 
to a leaf of <2 in which i?^. Current occurs twice, we have t2 — t'2 ■ t' ■ t2 where t' and are rooted at i?^. Current. 
Similarly, ti — t[ ■ i", where t'{ is rooted at Current. Replace ti by t[ ■ t' ■ t'l and t2 by t^ ■ ^2 and let the resulting 
valuation be vali. The insertion of t' into ti will not introduce any violation of CP|ll if there are 1 < j < k < i' 
with i?fc.Followup ^ and i?^. Current = i?^:. Current = A eV , then t2 is A-occurrence free and hence so is t' . The 
insertion of t' into ti will not introduce any violation of CP|n| for any 1 < j < i' — 1, if all parse trees occurring 
at level j -I- 1 or lower are i?^. Current-occurrence free, then t2 is i?^. Current-occurrence free and hence so is t' . 
The removal of t' from t2 also does not introduce any violation of CP|l] or CPUT] since removal of subtrees does not 
introduce new labels. Hence, vali continues to satisfy CPU and CPHII In addition, Ii{vali) = Ii{val) and for all 
1 <i < \RSl V[valij] C V[vali ij]. 

Step 2: Iterate step 1 as long as there are CP IIIII violation witnesses. Since each iteration reduces the number 
of nodes in one of the parse trees at level \RS\, this loop will stop after a finite number of rounds. At this stage, 
suppose we have the updated valuation valr that does not have any CP IIIII violation witnesses and satisfies CPU and 
CPini Also suppose that |[t'j^, . . . , are the trees remaining behind at level \RS\ after removal of subtrees from the 
original ones. The valuation valr \ {\RS\ — 1) for the reminder sequence RS \ {\RS\ — 1) satisfies CPU and CPlTTl 
(refer to proof of Prop.lTl]) and hence by induction hypothesis, there is a compact valuation val'^ for RS \ {\RS\ — 1) 
such that n(TO/;) = Ii{valr \ {\RS\ - 1)) and for all 1 < j < \RS\ ~ 1, V[valr \ {\RS\ - 1) i j] C V[val'^ i j]. Let val' 
be the valuation for RS such that for each 1 < i < \RS\, val' {i) — val'^ii) and val'{\RS\) — lt[, . . . It is clear 
that Il{val') = Il{val) and for all 1 < j < \RS\, V[val 4, j] C V[val' 4, j]. We will now prove that val' satisfies CPU If 
not, there is A G V such that Current = Current — A, i?fc.Followup = and there is a parst tree t occurring 
in val' at level fc -I- 1 or higher that is not A-occurrence free. Due to the compactness of val'^, t can not occur at level 
\RS\ — 1 or lower, hence t occurs in val' at level \RS\. Then t is one among p'^, . . . ,t'^}, say t[, which is obtained 
from ti by removing subtrees, where ti occurs in val at level \RS\. Hence, ti is not A-occurrence free, contradicting 
the fact that val satisfies CPU Hence, val' satisfies CPU We will show that val' satisfies CP|lTl If not, for some 
1 < i < \RS\, all parse trees occurring in val' at level i -|- 1 or lower are Current-occurrence free and there is a 
parse tree t occurring at level i -I- 2 or higher that is not i?^. Current-occurrence free. Due to the compactness of val'^., 
t can not occur at level — 1 or lower, hence t occurs in val' at level |i?5'|. Then t is one among p'^, . . . , t'^}, say t[, 
which is obtained from ti by removing subtrees, where ti occurs in val at level \RS\. Hence, ti is not A-occurrence 
free, contradicting the fact that val satisfies CPU (recall that V[val I i + 1] C V[val' ^i + 1]). Hence, val' satisfies 

cpin 

Step 3: The valuation valr obtained in step 2 does not have CP IIIII violation witnesses. But val' is obtained 
from valr by performing some changes, so val may have CP IIIII violation witnesses. Iterate step 2 as long as 
the resulting val' has CP IIIII violation witnesses. Since each iteration reduces the number of nodes in parse trees 
occurring at level \RS\, this loop will stop after a finite number of rounds. The resulting valuation val' has no CP IIIII 
violation witnesses, satisfies CPU and CPlHl H(waZ') = Il{val) and val' \ {\RS\ — 1) is compact. In addition, for 
all 1 < j < l^'S'l, V[val 4. j] C V[val' J, j]. We will prove that val' satisfies CPIIIII (and hence compact). Suppose 
not. For some 1 < i < \RS\, there is a parse tree ti occurring in val' at level i -I- 1 or lower that is not iJj. Current- 
occurrence free and there is a parse tree t2 occurring at level i -I- 2 or higher that is not i?^. Current-recurrence free. 
Since val' \ {\RS\ — 1) is compact, ^2 can not occur at level \RS\ — 1 or lower, so it has to occur at level \RS\. But 
then, (^1,^2) is a CP IIIII violation witness, a contradiction. Hence, val' satisfies CP IIIII and hence it is compact. □ 

The automaton -4(G) deletes some portions of parse trees and accepts a sequence of terminals corresponding 
to the deleted portion. To track the progress of A{G), we define below a function / that measures the amount of 
information contained in a parse tree. For convenience, we consider _L as a special parse tree such that root is its 
only node and is rooted at _L. The yield of _L is defined to be Y{1.) = 0. 

Definition 13. The function f from the set of parse trees to the set of natural numbers is defined as follows: 

• /W = 1- 

• f(t) — 2 if the root oft is a variable and all its children are labelled by terminals. This case applies even if the 
root of t does not have any children. 

• If a parse tree t has at least one child labelled by a variable, then f{t) = 1 + X]i<i<r fi^i)' where ti, . . . ,tr are 
the subtrees of t whose roots are children of the root of t and that are labelled by variables. 
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The function f is extended to the domain of valuations as follows. If val is a valuation for a reminder sequence RS , 
f{val) = El<^<\Rs\^val{im>l f {t)val{i){t) . 

To describe runs of our automaton, we maintain a valuation for its current state apart from the word run over 
till the current state. This is formalized in the definition below. 

Definition 14. A configuration c is a tuple {RS , w, val, t) where RS is a state of A{G), w G S* is a word, val is a 
compact valuation for RS and t is a parse tree rooted at .Current. The size \c\ of c is defined to be f (val) + f (t) . 

Now we will show that the transitions of A{G) can be used to traverse a parse tree in such a way that the size of 
reminder sequences needed never exceed a small bound. If RS is a reminder sequence and A G V is a variable, we say 
that RS ends with A when .Current = A. For a parse tree t, w{t) is the word over S obtained by concatening 
the labels of those children of root(t) that are labelled by terminals. If the root does not have any children or all 
children of the root are labelled by variables, then w{t) = e. Immediate subtrees of t are those subtrees of t whose 
roots are children of the root of t. The proof of the following lemma is based on the intuition given after Def. 1101 

Lemma 15. Suppose G is a CFG and c = (RS,w, val,t) is a configuration satisfying the following properties: 

I If A V is such that Current = i?^. Current = A, l<j<k< \RS\ and i?fe.Followup ^ 0. then all immediate 
subtrees oft are A-occurrence free. 

II For every 1 < i < \RS\, if all parse trees occurring in val at level i + 1 or lower are Ri.CnTvent- occurrence free 
and i?i+i.Followup ^ 0, then t is i?^. Current-occurrence free. 

III For every I < i < \RS\, if there is a parse tree occurring in val at level i + I or lower that is not i?^. Current- 
occurrence free and i?i+i.Followup ^ 0, then t is Ri.CvLTTent-recurrence free. 

If the size of such a configuration c is greater than 1, then there is another configuration c! = [RS' ,ww' , val',t') that 
satisfies the above three properties in addition to the following ones: 

IV RS ^ RS', 
V |c'| < |c| and 

VI U{val) + U{w) + Il{Y{t)) = U{val') + U{ww') + Il{Y{t')). 

Proof of Lemma \15\ Let RS = RSi ■ (^,v) with RSi possibly equal to the empty sequence e and A possibly equal 
to the special symbol _L. We distinguish between cases based on whether v = and whether some children of root(i) 
are labelled with variables. In all the following cases, properties IIVI IVl and IVII arc clearly satisfied, so they will not 
be mentioned explicitly. 

Case 1: V = and all children of the root of t are labelled with terminals. In this case, A w{t) is a production 
of G. We can take RS' = RSi ■ (_L, 0), w' = w{t), val' — val and t' = _L. The new configuration c' satisfies properties 
mHI] and ET] since t' = _L. 

Case 2: V = and at least one child of the root of t is labelled with a variable. In this case, A wqAiWi ■ ■ ■ Aj-Wj. 
is a production of G such that w{t) = wq ■ ■ ■ Wr. Let ti, . . . , be the immediate subtrees of t rooted at Ai, . . . , Ar 
respectively. If r > 2 and RSi ends with A' ^V such that one of the trees ti, . . . ,tr is ^'-occurrence free, that one 
should be chosen as t' for the next configuration c'. If r > 2 and RSi ends with A' ^ V such that none of the the 
trees among ti, . . . ,tr are A'-occurrence free, make one of them ^'-recurrence free by applying Lemma [T] and choose 
that as t' for the next configuration c'. Assume without loss of generality (wlog) that if r > 2 and RSi ends with A' , 
then ti is A'-occurrence free when possible and A'-recurrence free otherwise. Let RS' — RSi ■ {Ai, IA2, . . . , Arf). If 
variable Ai occurs more than once in RSi, then c would violate property |T] that ti is Ai-occurrence free. Hence Ai 
occurs at most once in and hence i^S*' is a state of A(G'). Let uaZi be such that ua/i \ {\RS\ — 1) = val \ {\RS\ — 1) 
and vali{\RS\) — |[t2, . . . , tri- This new valuation vali satisfies CPlH since vali \ {\RS\ — 1) already satisfies CPU 
(due to compactness of val), it is enough to observe that for any A' ^ V with _Rj. Current — i?^. Current = A' and 
1 < J < fc < \RSi\, we have that ti,. . . ,tr are A'-occurrence free since c satisfies propertylH The new valuation vali 
satisfies CPHTJ since vali t (l^'S'l ~ 1) already satisfies CPllIl (due to compactness of val), it is enough to observe 
that for any j < \RSi\ — 1, if all parse trees occurring in vali at level j -I- 1 or lower are i?^. Current-occurrence free, 
then so are all parse trees occurring in val at level j -I- 1 or lower, and hence, ii, . . . , are i?^. Current-occurrence free 
since c satisfies property|lIl Let val' be the compact valuation given by Lemma [T^ such that Il{val') = Il{vali). We 
take RS' = RSi ■ (Ai, [As, . . .,Ar}), w' = w{t), val' is the one obtained above by applying Lemma [T^] and t' = ti. 

The new configuration c' — {RS' ,ww' , val' ,t') given above satisfies property ID let A' V he such that 
i?j.Currcnt i?^. Current = A',l<j<k< \RS'\ and i?fe.Followup 7^ 0. If A: < \RSi\, then ti is ^'-occurrence free 
since c satisfies property HI If /c = \RS'\ (in which case A' = Ai), j < \RSi \ and aU parse trees occurring in val at level 
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j + 1 or lower are v4'-occurrence free, then since c satisfies property HH t is A'-occurrence free and so is ii. If A; = \RS'\, 
j < \RSi\ and a parse tree occurring in val at level j + 1 or lower is not ^'-occurrence free, then since c satisfies 
property IIIIl t is A'-recurrence free and hence immediate subtrees of ti are ^'-occurrence free (since ti is rooted at 
A' = Ai). If fc = \RS'\ and j — \RSi\, i?fc.Followup 7^ implies that we chose ti to be Current-recurrence free 
at the beginning of this case, hence immedate subtrees of ti are Current-occurrence free. The new configuration 
also satisfies property HIl suppose for any 1 < i < li?^'!, all parse trees occurring in val' at level i -I- 1 or lower are 

Current-occurrence free and Current 7^ 0. If i < \RS'\ — 1, then V[val — V[vali I C V[val' + 

implies that all parse trees occurring in val at level i + 1 or lower are i?^. Current-occurrence free. Since c satisfies 
property im t is i?^. Current-occurrence free and so isti. If z = |i?S''| — 1, r > 2 implies that ii is i?^. Current-occurrence 
free by the choice of ti made at the beginning of this case. If i = |i?S''| — 1, r = 1 implies that i?i+i.Followup = so 
this case is not applicable. Finally, the new configuration satisfies property IIIIl for some 1 < i < \RS'\, suppose there 
is a parse tree occurring in val' at level i + 1 or lower that is not Current-occurrence free and i?i+i.Followup 7^ 0. 
If i < \RSi\ and a parse tree occurring in val at level i -I- 1 or lower is not Current-occurrence free, then since c 
satisfies property IIIH t is Current-recurrence free and hence so is ti. If j < li?^!! and all parse trees occurring in 
val at level i -1-1 or lower are ii!^. Current-occurrence free, then since c satisfies propertyHIl t is Current-occurrence 
free and so is ti. If i = \RSi\ and r = 1, then i?i_|_i.Followup = so this case does not apply. If i — \RSi\ and r > 2, 
then ti is Current- recurrence free by the choice of ti we made at the beginning of this case. 

Case 3: RS = RSi ■ {A, jAi,.. .,ArJ) ■ (-L,0). Let val{\RSi \ -I- 1) = pi, . . . , t^l such that ti,...,tr are rooted at 
Ai,. . . ,Ar respectively. If r > 2 and RSi ends with A! such that one of the trees ti, . . . ,tr is ^'-occurrence 
free, that one should be chosen as t' for the next configuration c'. If r > 2 and RSi ends with A' & V such that 
none of the the trees among ti,...^tr are A'-occurrence free, make one of them A'-recurrence free by applying 
Lemma [Hand choose that as t' for the next configuration c'. Assume wlog that if r > 2 and RSi ends with A', 
then ti is A'-occurrence free when possible and A'-recurrence free otherwise. Let RS' = RSi ■ {Ai, IA2, . . . ,^rl)- 
If variable Ai occurs more than once in RSi, it leads to a contradiction since compactness of val implies that ti is 
j4i-occurrence free. Hence Ai occurs at most once in RSi and hence RS' is a state of A{G). Let vah be such that 
vah \ {\RS'\ - 1) = val \ {\RS'\ - 1) and vali{\RS'\) = |[t2, . . .^Ul This new valuation vah satisfies CPUand CPUD 
since it is obtained from val by moving subtrees within parse trees occurring at level \RS'\ and removing a parse tree 
from the same level. Let val' be the compact valuation given by Lemma fT2l such that Il{val') — Il{vali). We take 
RS' = RSi ■ {Ai, 1^2, . . . , ArJ), w' = e, val' is the one obtained above by applying Lemma [T2] and t' = ti. 

The new configuration c' = {RS' jWw' , val' ,t') given above satisfies property HI let A' ^ V he such that 
i?j.Current = i?fc. Current = A' , 1 < j < k < \RS'\ and iifc.Followup ^9. If fc < |ir:5"|, then h is A' -occurrence 
free by compactness of val. If fc = l-R-S*'] (in which case A' — Ai) and j = \RSi\, then i?fc.Followup 7^ implies 
that ti is ^'-recurrence free by the choice of ti wc made in the biginning of this case, hence immediate subtrees 
of ti are A'-occurrence free. If fc = \RS'\, j < \RSi\ and all parse trees occurring in val at level j -|- 1 or lower 
are A'-occurrence free, then ti is A'-occurrence free by compactness of val. If fc = \RS'\, j < \RSi\ and a parse 
tree occurring in val at level j -I- 1 or lower is not yl'-occurrence free, then compactness of val implies that ti is 
v4'-recurrence free, so immediate subtrees of ti are ^'-occurrence free. The new configuration also satisfies property 
im suppose for any 1 < i < l-R^'j, all parse trees occurring in val' at level z -f 1 or lower are i?^. Current-occurrence 
free and Folio wup 7^ 0. If i < - 1, then V[val i i + 1] ^ V[vali i i + 1] C V[val' 4. i -I- 1] implies that 

all parse trees occurring in val at level i -f 1 or lower are Current-occurrence free. By compactness of val, ti is 

Current-occurrence free. If i = \RS'\ — 1, r > 2 implies that ti is Current-occurrence free by the choice of t\ 
made at the beginning of this case. Hi— \RS'\ — 1, r = 1 implies that i?i+i.Followup = 0, so this case does not 
apply. Finally, the new configuration satisfies property IIIIl for some 1 < z < li?^'!, suppose there is a parse tree 
occurring in val' at level 1 or lower that is not Current-occurrence free. If i < \RSi\ and a parse tree occurring 
in val at level or lower is not Current-occurrence free, then by compactness of val, ti is i?,. Current-recurrence 
free. If i < liJS'ij and all parse trees occurring in val at level z -I- 1 or lower are i?^. Current-occurrence free, then by 
compactness of val, ti is i?^. Current-occurrence free. If z = \RSi\ and r = 1, then i?i+i.Followup = 0, so this case 
does not apply. If z = l-RS*!! and r > 2, then ti is Current- recurrence free by the choice of ti we made at the 
beginning of this case. 

Case 4: RS = RSi ■ {A, lAi, . . . ,ArJ), r > 1 and all children of the root of t are labelled by terminals. In this 
case, A w{t) is a production of G. Let val{\RSi \ + 1) = pi, . . . , t^J such that ti,. . . ,tr are rooted at Ai, . . . ,Ar 
respectively. If r > 2 and RSi ends with A' € V such that one of the trees ti, . . . ,tr is A'-occurrence free, that one 
should be chosen as t' for the next configuration c'. If r > 2 and RSi ends with A' ^ V such that none of the the trees 
among ti, . . . ,tr are A'-occurrence free, make one of them A'-recurrence free by applying Lemma[T]and choose that as 
t' for the next configuration c'. Assume wlog that if r > 2 and RSi ends with A' , then <i is ^'-occurrence free when 
possible and A'-recurrence free otherwise. Let RS' — RSi ■ {Ai, IA2, . . . , A^f). If variable Ai occurs more than once 
in RSi, it leads to a contradiction since compactness of val implies that ti is Ai-occurrence free. Hence Ai occurs 
at most once in RSi and hence RS' is a state of A{G). Let vali be such that vali \ {\RS'\ — 1) = val \ {\RS'\ — 1) 
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and vali{\RS'\) — p2,---,irl- This new valuation vah satisfies CP|T] and CPlIIl since it is obtained from val by 
moving subtrees within parse trees occurring at level and removing a parse tree from the same level. Let val' 

be the compact valuation given by Lemma [T^ such that Il{val') — Il{vali). We take RS' = RSi ■ (Ai, IA2, . . . , A^l), 
w' ~ w{t), val' is the one obtained above by applying Lemma [n\ and t' = ti. 

The new configuration c' = {RS' ,ww' , val' ,t') given above satisfies property H] let A' ^ V he such that 
Current = iJ^. Current ^ A' , I < j < k < \RS'\ and i?i;.Followup 7^ 0. If A: < \RS'\, then ti is A'-occurrence 
free by compactness of val. If A: = \RS'\ (in which case A' = Ai) and j = then i?/j.Followup 7^ implies that 

r > 2 and hence by the choice of ti made at the beginning of this case, ti is A'-recurrence free and hence immediate 
subtrees of ti are A'-occurrence free. If fc = \RS'\, j < \RSi\ and all parse trees occurring in val at level j + 1 or 
lower are A'-occurrence free, then by compactness of val, ti is A'-occurrence. If A: = \RS'\, j < \RSi\ and a parse 
tree occurring in val at level j ' + 1 or lower is not A'-occurrence free, then by compactness of val, ti is A'-recurrence 
free and hence immediate subtrees of ti are A'-occurrence free (since ti is rooted &t A' ~ Ai). The new configuration 
also satisfies property [TTl suppose for any 1 < ? < | RS' \ , all parse trees occurring in val' at level i + 1 or lower are 
Ri . Current-occurrence free and i?,+i.Followup 7^ 0. If i < \RSi\, then V[valii + 1] = V[vali + C V[val' + 
implies that all parse trees occurring in val at level j + 1 or lower are Current-occurrence free. By compactness 
of val, ti is Current-occurrence free and so is ti. If z = l-R^il, r > 2 implies that ti is Current-occurrence free 
by the choice of ti made at the beginning of this case, li i = r = 1 implies that i?i+i.Followup = so this 

case does not apply. Finally, the new configuration satisfies propertv IIIII for some 1 < i < \RS'\, suppose there is 
a parse tree occurring in val' at level z -t- 1 or lower that is not i?;. Current-occurrence free and i?i+i.Followup 7^ 0. 
If i < l-RS'il and a parse tree occurring in val at level i -I- 1 or lower is not 7?^. Current-occurrence free, then by 
compactness of val, ti is i?^. Current-recurrence free. If i < \RSi \ and all parse trees occurring in val at level z -I- 1 or 
lower are i?^. Current-occurrence free, then by compactness of val, ti is Current-occurrence free. If i — \RSi \ and 
r = 1, then i?i_|_i.Followup = so this case does not apply. If z = \RSi \ and r > 2, then ti is i?^. Current-recurrence 
free by the choice of ti we made at the beginning of this case. 

Case 5: RS — RSi ■ {A, lAi, . . . , Arf), r > 1 and at least one child of the root of t is labelled with a variable. 
In this case, A ^ wqA'iWi ■ ■ ■ A'gWg is a production of G such that w(t) = wq ■ ■ • Ws- Let ti, . . . ,ts be the immediate 
subtrees of t rooted at A[,. . . ,A'g respectively. If s > 2 and one of the trees ti,. . . ,ts is A-occurrence free, that 
one should be chosen as t' for the next configuration c'. If s > 2 and none of the the trees among ti,...,ts 
are A-occurrence free, make one of them A- recurrence free by applying Lemma [T] and choose that as t' for the next 
configuration c'. Assume wlog that if s > 2, ti is A-occurrence free when possible and A-recurrence free otherwise. Let 
RS' = RSi-{A, lAi,.. . ,Arl)-{A'i,lA'2, . . . A'J). If variable A'^ occurs more than once in RSi-{A, jAi,.. .,ArJ), then 
c would violate property |T] that ti is A'^-occurrence free. Hence A[ occurs at most once in RSi ■ {A, lAi, . . . , ArJ) and 
hence RSi ■ {A, [Ai,. . .,Ar}) ■ [A^, 11^^,...^^) is a state of A(G'). Let vah be such that vah \ i\RS\) = val \ l\RS\) 
and vah{\RS\ + l) — |[t2, . . . , isl- This new valuation vah satisfies CPlU since vah \ i\RS\) already satisfies CP|T](due 
to compactness of val), it is enough to observe that for any A' G V with Current = i?fc. Current = A', 1 < j < 
k < \RS\ and i?fc.Followup 7^ 0, we have that ti,. . . ,tr are A'-occurrence free since c satisfies property HI This new 
configuration also satisfies CPHII since vah \ {\RS\) already satisfies CPllTl (due to compactness of val), it is enough 
to observe that for any z < ji?!?!], if all parse trees occurring in vah at level z -I- 1 or lower are i?^. Current-occurrence 
free, then by property HIl t is i?^. Current-occurrence free and hence so are t2, ■ ■ ■ ,ts- Let val' be the compact valuation 
given by LemmalHsuch that Ilival') = Il{vah). We take RS' = RSi • (A, JAi, . . . , ■ (A'^, [A^, . . . A'J), w' = w{t), 
val' is the one obtained above by applying Lemma fT2l and t' —ti. 

The new configuration c' = {RS' ,ww' , val' ,t') given above satisfies property HI let A' ^ V he such that 
Current = i?fe. Current = A', 1 < j < k < \RS'\ and i?fc.Followup 7^ 0. If A: < \RS\, then ti is A'-occurrence free 
since c satisfies propertyH) If fc = \RS'\ (in which case A' = A[) and j — \RS\, then A' = A[ = A. i?fc.Followup 7^ 
implies that s > 2 and hence ti is A'-recurrence free by the choice of ti made at the beginning of this case. Since 
A' = A[ and ii is rooted at A[, immediate subtrees of ti are A'-occurrence free. If A: = \RS'\, j < \RS\ and there 
is a parse tree occurring in val at level j -I- 1 or lower that is not A'-occurrence free, then since c satisfies proeprty 
mil t is A'-recurrence free and since ti is rooted at A[ = A', immediate subtrees of ti are A'-occurrence free. If 
k — \RS'\, j < \RS\ and all parse trees occurring in val at levels j -I- 1 or lower A'-occurrence free, then since c 
satisfies propertv HIl t is A'-occurrence free and hence so is ti. The new configuration also satisfies propertv HIl 
suppose for any 1 < z < \RS'\, all parse trees occurring in val' at level i + 1 ot lower are i?^. Current-occurrence free 
and i?i+i.Followup 7^ 0. If z < \RS'\ - 1, then V[val I i + 1] = V[vah i i + I] C V[val' + implies that all 
parse trees occurring in val at level z -I- 1 or lower are i?^. Current-occurrence free. Since c satisfies property HH t is 
Current-occurrence free and so is ti. For z = — 1, s > 2 implies that ti is i?^. Current-occurrence free by the 
choice of ti made at the beginning of this case. For z = — 1, s = 1 implies that i?i+i.Followup = and hence 
this case does not apply. Finally, the new configuration satisfies property IIIII for some 1 < z < \RS'\, suppose there 
is a parse tree occurring in val' at level z -I- 1 or lower that is not Current-occurrence free and i?i+i.Followup 7^ 0. 
If z < \RS\ and a parse tree occurring in val at level z + 1 or lower is not Current-occurrence free, then since c 
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satisfies property IIIIl t is i?^. Current- recurrence free and hence so is ti. If z < \RS\ and all parse trees occurring in 
val at level i + 1 or lower are i?^. Current-occurrence free, then since c satisfies property [TTl t is Current-occurrence 
free and so is ti. li i = \RS\ and s = 1, then i?i_|.i.Followup = so this case does not apply. If i = \RS\ and s > 2, 
then ti is Current- recurrence free by the choice of ti we made at the beginning of this case. □ 

Now we are ready to prove that for every word w generated by a CFG G, A{G) accepts a word w' such that 
n(w) n(u;')- 

Theorem 16. // a word w can be derived in G from the axiom S , then the automaton A{G) accepts some word w' 
such that Il{w') — Il{w). 

Proof. Let c = (RS , w, val, t) be a configuration. We will prove by induction on |c| that there is a word wi such that 
RS =^k> (-L,0) and n(wi) = II{val) + Il{Y{t)). For the base case |c| = 1, we can take wi = e. 

For the induction step, suppose |c| > 1. Let c' = {RS' ,ww' , val' ,t') be the configuration given by Lemma 1151 

We have RS ^ RS', U{val) + Il{Y{t)) = Il{val') + Il{w') + Tl{Y{t')) and |c'| < |c|. By induction hypothesis, 
there is a word W2 such that RS' =k> (-L,0) and n(w2) ~ IV{val') + Ii{Y{t')). Putting things together, we get 

RS ^ RS' ^ (±,0) and \l{val) + Tl{Y{t)) = n{w') + n(w2). Now we can take wi = w'w2 to complete the 
induction step. 

Now we will prove the lemma. Let i be a parse tree associated with the derivation of w from S so that Y(t) = w. 
Consider the configuration {{S, 0), e, {1 0}, t). From the above result, we get a word wi such that {S, 0) =^ (_L, 0) 

and n(wi) = n({i ^ 0}) + n{Y(t)) = n(w). □ 

Next we will prove the converse direction: if A{G) accepts a word w, then G can generate a word w' from S such 
that Il{w') — n(w). Given a reminder sequence RS, we denote by ^RS} the multiset over V such that for all A ^ V, 
lRSj{A) = J2i<i<\RS\ -Ri-Followup(^) -I- |Ii?|fl5|.Current]](A). For any word u over T. U V , u \ V {u \ j:) is the word 
obtained from u by replacing every occurrence of an element from E (V) with e respectively. 

Theorem 17. If if A{G) accepts a word w, then G can generate a word w' from S such that Ii{w') ~ Ii{w). 

Proof. We claim that if [S, 0) RS, then G can generate a word u such that \RS\ — Ii{u \ V) and I{{w) — Ii{u \ S). 
Proof is by induction on length I of the run of A{G) on w. For the base case t — {),we can take u — S. 

For the induction step, suppose [S, 0) => RSi =s> RS , where RSi => RS is one of the transition relations 
used by A{G) (from Def. [5]). By induction hypothesis, G can generate a word u' such that J-RS'il = n(u' |" V) and 
Tl{w'i) = U{u' \ S). 

Case 1: A woAiWi ■ ■ ■ ArWr is a production with r > 1, w'2 — woWi---Wr and RSi = RS2 ■ (^,0) ==^ 
RS2 ■ {Aj, lAi,...,Arle IAjI) = RS. Since Current = n(V \ V){A) > 1. Let u' = uiAu2. We can take 

U = UiWqAiWi ■ ■ ■ ArWrU2. 

Case 2: A wqAiWi ■ • ■ ArWr is a production with r > 1, w'2 = wqWi • • ■ Wr, v ^ and RSi = RS2 ■ {A, v) =^ 
RS2-iA,v) ■{Aj,lAi,...,Ar}elAjl) ^ RS. Since Current = A, n(u' \V){A)>1. Let u' ^ uiAu2. We can 

take U — UiWqAiWi ■ • ■ ArWrU2. 

Case 3: A w'2 is a. production and RSi = RS2 ■ [A, 0) =^ RS2 ■ (-L, 0) = RS. Since .Current ~ A, 

n(it' I" V){A) > 1. Let u' — uiAu2. We can take u ~ uiw'2U2. 

Case 4: A ^ w'2 is a production and RSi = RS2-{A,-^®\A'\) =^ RS2-{A',^) = RS. Since .Current — A, 

n(it' \ V){A) > 1. Let u' = uiAu2. We can take u = uiw'2U2. 

Case 5: RSi = RS2 ■ (A, v © [A'D • (_L, 0) ^ iSfi-a • (A', v). We can take u = u' . This completes the induciton 
step and hence the claim is true. 

Now we will prove the lemma. Suppose {S, 0) =^ (_L, 0). By the above claim, G can generate a word u such that 
= [[(_L, 0)]] = n(u \ V) and n(u)) = n(ii f S). □ 

Acknowledgements The author would like to thank Pierre Ganty for helpful discussions. 
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